Skip to content

Fix documentation typos and grammar errors#13801

Open
DimitriPapadopoulos wants to merge 1 commit intopypa:mainfrom
DimitriPapadopoulos:copilot/fix-documentation-issues
Open

Fix documentation typos and grammar errors#13801
DimitriPapadopoulos wants to merge 1 commit intopypa:mainfrom
DimitriPapadopoulos:copilot/fix-documentation-issues

Conversation

@DimitriPapadopoulos
Copy link
Contributor

@DimitriPapadopoulos DimitriPapadopoulos commented Feb 11, 2026

Corrects grammar errors, typos, and duplicate words in documentation files.

Requesting to skip news.

Initially created from Copilot CLI via the copilot delegate command.

@DimitriPapadopoulos DimitriPapadopoulos changed the title Copilot/fix documentation issues Fix documentation typos and grammar errors Feb 11, 2026
@DimitriPapadopoulos DimitriPapadopoulos force-pushed the copilot/fix-documentation-issues branch from c5b020a to ad16777 Compare February 11, 2026 21:11
Copy link
Member

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

(cc @notatallshaw, do we want to ask that AI authorship is removed?)

Copy link
Member

@notatallshaw notatallshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the changes are fine. But yeah I have two issues with the use of LLMs here:

  1. I don't know how much value fly by typo and grammar fixes are, you don't learn anything about how to contribute to pip and if these are a real concern we should probably automate suggestions on a regular basis.

  2. The way the commits are structured when we generate the authors file @DimitriPapadopoulos will not be considered an author, instead copilot-swe-agent[bot] will be the "author". While I have zero problem with using LLMs to assist in software engineering I don't understand why a PR submitter would not want to be considered the authors of the commits. It also leaves a lot of other open questions if the PR submitter is not author of the commits that should be discussed elsewhere.

So yes, at least for now, I'm going to ask PR submitters to have commits that are their own.

So please update this PR to use commits where you are the author, and make the below change.

@notatallshaw notatallshaw added the skip news Does not need a NEWS file entry (eg: trivial changes) label Feb 12, 2026
@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Feb 12, 2026

I agree LLM fixes should be automated, but am not sure how to best achieve that yet. Perhaps in different PRs (after a discussion in issues) ?

I don't know how it works in other domains, but research ethics requires transparency about AI tool usage. See for example The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool. I thought a good way to achieve that is not to endorse commits as your own when they were actually written by Copilot. Personally, I would insist commits written by Copilot appear as such. From a practical point of view, future training of LLM models would probably benefit from such a distinction — commits written by humans and commits written by an AI. If you insist, I could endorse the commits but clearly state Co-authored-by: <some AI>, but I think that's suboptimal because Copilot wrote the commits while I reviewed and modified them using prompts.

@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2026

I am a strong -1 on having LLMs appear in the Authors file. A line in the authors file doesn’t provide any nuance on our policies. There are many people with strong opinions on the use of LLMs, and to those people, that would signal “pip accepts vibe-coded junk”. IMO, we don’t want to deal with the sort of publicity that would generate, justified or not.

Regarding the actual contribution. Did you review every change yourself, and confirm that you personally agree with it? If so, then it seems to me that the changes are your work, and can be credited to you. If you disagree, or if you didn’t review everything, then I don’t think the PR meets our criteria for LLM use, and should therefore be rejected in any case.

Also, I agree with @ichard26 that I’m not sure of the value in fly-by typo fixes, particularly when they don’t act as “practice” for the contributor - which LLM generated ones clearly don’t.

And as a final note, I’ll point out that reviewing this PR has probably taken far more time due to the LLM usage than a hand coded contribution would have, so that usage was almost certainly a net loss in productivity for the project at this point.

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message here is needlessly verbose, replicating the changes made in detail. This type of message adds no value and feels typical of the verbosity LLMs introduce. Please fix.

@DimitriPapadopoulos
Copy link
Contributor Author

I did review the commits carefully - and perhaps asked Copilot to revert or fix some changes.

I disagree about the net loss. At some point, we (the project, me, others) have to address the use of LLM, but I totally agree about the verbosity of resulting commits...

Is it OK to use Co-authored-by: <some AI>?

@DimitriPapadopoulos DimitriPapadopoulos force-pushed the copilot/fix-documentation-issues branch 2 times, most recently from d254cb8 to 4afffa0 Compare February 12, 2026 09:34
@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2026

Is it OK to use Co-authored-by: ?

Not for me. Be prepared to take full personal responsibility for your PR, or don't submit it. You don't say "Co-authored by: VS Code spell checker", do you?

I'll let the other pip maintainers add their own views, though. I'm just giving my personal POV.

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Feb 12, 2026

It's not that I don't want to take full personal responsibility for this PR. It's just that in the context of scientific writing, it would be considered unethical to hide that parts of a scientific paper have been generated using AI. The reason is that AI is a game changer (more than tools like codespell 😄) and we want to avoid non-sensical (for now ?) scientific papers written by (instead of with help from) AIs. But then, seeing AI as a mere tool makes sense too.

I will remove the Co-authored-by: mention if you maintainers insist on it.

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Feb 12, 2026

In the same way that a compiler inserts in object files a comment like [GCC 13.3], I feel it's a good idea to document which AI generated Python-level code and comments from prompts. I realise that I am shifting the discussion well beyond the scope of a mere PR, and that no definitive answers can be given here — just trying to explain my perspective.

@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2026

Understood, and I don't want to derail this discussion (as you say, it's a much wider question) but in the context of reproducibility, without the prompts (or even with them, see below) the generated code is literally all anyone has, and therefore what is relevant is:

  1. The generated code (needed to build pip), and
  2. The name of the person responsible for that code, for audit and tracking purposes.

On that second point, if there's a bug1 it's important to be able to reach out to the original author and ask for the reasoning behind the code. Or if the code is malicious, to review and potentially remove any other code by that author. If the author can say "but that code was generated by an LLM, so I can't explain the reasoning", we have a maintainability problem.

To address your other point, I think the position around scientific writing isn't necessarily the same as the position for code contributions - especially with the prevalence in software of "vibe coding", where the submitter may well not have even read the code they are submitting as a PR. But even in science, surely reproducibility of results is crucial? How could a paper that claimed an LLM determined some result based on a set of evidence be credible, if it wasn't possible to reproduce that line of argument by re-running the LLM interaction? And as LLMs change their training data regularly, even just using the same prompts and the same LLM isn't a guarantee of the same results.

I'll try to resist the temptation to engage further on this digression. LLM use is something I have relatively strong but incomplete opinions on, and I don't want to dominate the discussions here with my views. So please don't be offended (or assume that I agree 😉) if I ignore any further responses you make.

Footnotes

  1. Not likely with a typo correction, obviously, but the general principle is what matters to me here.

@DimitriPapadopoulos DimitriPapadopoulos force-pushed the copilot/fix-documentation-issues branch from 4afffa0 to 2029ac6 Compare February 12, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip news Does not need a NEWS file entry (eg: trivial changes)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments